Overview

Dataset statistics

Number of variables19
Number of observations31648
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory12.8 MiB
Average record size in memory424.8 B

Variable types

NUM14
CAT5

Reproduction

Analysis started2020-05-31 13:11:35.792257
Analysis finished2020-05-31 13:12:02.482025
Duration26.69 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

country has a high cardinality: 78 distinct values High cardinality
state has a high cardinality: 69 distinct values High cardinality
city has a high cardinality: 5905 distinct values High cardinality
10k is highly correlated with 5k and 9 other fieldsHigh correlation
5k is highly correlated with 10k and 8 other fieldsHigh correlation
20k is highly correlated with 5k and 9 other fieldsHigh correlation
half is highly correlated with 5k and 9 other fieldsHigh correlation
25k is highly correlated with 5k and 9 other fieldsHigh correlation
30k is highly correlated with 5k and 9 other fieldsHigh correlation
35k is highly correlated with 5k and 9 other fieldsHigh correlation
40k is highly correlated with 5k and 9 other fieldsHigh correlation
official is highly correlated with 5k and 9 other fieldsHigh correlation
pace is highly correlated with 5k and 9 other fieldsHigh correlation
overall is highly correlated with 10k and 9 other fieldsHigh correlation
genderdiv is highly correlated with overallHigh correlation
bib has unique values Unique

Variables

5k
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count1433
Unique (%)4.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.22636710970294452
Minimum0.0
Maximum1.0000000000000002
Zeros1
Zeros (%)< 0.1%
Memory size247.4 KiB

Quantile statistics

Minimum0
5-th percentile0.1525252525
Q10.1877525253
median0.2184343434
Q30.2607323232
95-th percentile0.3207070707
Maximum1
Range1
Interquartile range (IQR)0.07297979798

Descriptive statistics

Standard deviation0.05322533827
Coefficient of variation (CV)0.2351284086
Kurtosis1.771341124
Mean0.2263671097
Median Absolute Deviation (MAD)0.03535353535
Skewness0.5653030277
Sum7164.066288
Variance0.002832936634
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.2042929293810.3%
 
0.2036616162790.2%
 
0.1986111111780.2%
 
0.2061868687750.2%
 
0.2111111111740.2%
 
0.2064393939720.2%
 
0.1946969697720.2%
 
0.2011363636720.2%
 
0.2060606061710.2%
 
0.2108585859710.2%
 
0.2095959596700.2%
 
0.2054292929700.2%
 
0.2070707071690.2%
 
0.2045454545690.2%
 
0.2142676768680.2%
 
0.2058080808680.2%
 
0.2089646465680.2%
 
0.2143939394670.2%
 
0.203030303670.2%
 
0.2098484848670.2%
 
0.2152777778660.2%
 
0.2005050505660.2%
 
0.2080808081660.2%
 
0.1991161616650.2%
 
0.2068181818640.2%
 
Other values (1408)2989394.5%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.0031565656571< 0.1%
 
0.0034090909094< 0.1%
 
0.0079545454552< 0.1%
 
0.0082070707071< 0.1%
 
0.011616161622< 0.1%
 
0.011742424243< 0.1%
 
0.014772727271< 0.1%
 
0.01489898992< 0.1%
 
0.015530303032< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.5584595961< 0.1%
 
0.51868686871< 0.1%
 
0.51010101011< 0.1%
 
0.49532828281< 0.1%
 
0.48888888891< 0.1%
 
0.48169191921< 0.1%
 
0.47891414141< 0.1%
 
0.47159090911< 0.1%
 
0.46969696971< 0.1%
 

10k
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count2659
Unique (%)8.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3655919974301697
Minimum0.0
Maximum1.0
Zeros1
Zeros (%)< 0.1%
Memory size247.4 KiB

Quantile statistics

Minimum0
5-th percentile0.240497076
Q10.300229741
median0.3515037594
Q30.421679198
95-th percentile0.532059315
Maximum1
Range1
Interquartile range (IQR)0.121449457

Descriptive statistics

Standard deviation0.09095901505
Coefficient of variation (CV)0.2487992508
Kurtosis0.4535122883
Mean0.3655919974
Median Absolute Deviation (MAD)0.05931495405
Skewness0.5585924837
Sum11570.25553
Variance0.008273542419
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.3316624896470.1%
 
0.337823726470.1%
 
0.3355263158420.1%
 
0.3279030911420.1%
 
0.3425229741410.1%
 
0.3367794486410.1%
 
0.3237259816410.1%
 
0.3264411028400.1%
 
0.3308270677400.1%
 
0.3430451128390.1%
 
0.3376148705390.1%
 
0.3193400167380.1%
 
0.3306182122380.1%
 
0.3412698413370.1%
 
0.3186090226370.1%
 
0.3294695071370.1%
 
0.3280075188370.1%
 
0.3344820384370.1%
 
0.34732665370.1%
 
0.3491019215360.1%
 
0.330304929360.1%
 
0.3285296575360.1%
 
0.3414786967360.1%
 
0.3383458647360.1%
 
0.3256056809360.1%
 
Other values (2634)3067596.9%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.00522138683< 0.1%
 
0.0053258145361< 0.1%
 
0.0078320802011< 0.1%
 
0.010756056811< 0.1%
 
0.010964912281< 0.1%
 
0.015768588141< 0.1%
 
0.016186299083< 0.1%
 
0.017543859652< 0.1%
 
0.022243107771< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.90016708441< 0.1%
 
0.81829573931< 0.1%
 
0.81672932331< 0.1%
 
0.81276106931< 0.1%
 
0.77756892231< 0.1%
 
0.76817042611< 0.1%
 
0.76159147871< 0.1%
 
0.74926900581< 0.1%
 
0.74780701751< 0.1%
 

20k
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count5228
Unique (%)16.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3677107411271664
Minimum0.0
Maximum0.9999999999999999
Zeros1
Zeros (%)< 0.1%
Memory size247.4 KiB

Quantile statistics

Minimum0
5-th percentile0.231997344
Q10.2951925631
median0.3500664011
Q30.4252324037
95-th percentile0.5621912351
Maximum1
Range1
Interquartile range (IQR)0.1300398406

Descriptive statistics

Standard deviation0.1017784116
Coefficient of variation (CV)0.2767893352
Kurtosis0.596195049
Mean0.3677107411
Median Absolute Deviation (MAD)0.06332005312
Skewness0.7219371765
Sum11637.30954
Variance0.01035884507
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.3358831341280.1%
 
0.3220717131270.1%
 
0.3330677291250.1%
 
0.3297211155250.1%
 
0.3213811421240.1%
 
0.3248339973240.1%
 
0.3191500664230.1%
 
0.3315803453230.1%
 
0.3360956175230.1%
 
0.3373173971230.1%
 
0.3852324037220.1%
 
0.333439575220.1%
 
0.321752988220.1%
 
0.3451792829220.1%
 
0.3879415671220.1%
 
0.3294555113220.1%
 
0.3245683931210.1%
 
0.3391766268210.1%
 
0.2632138114210.1%
 
0.3212217795210.1%
 
0.3293492696210.1%
 
0.3257370518210.1%
 
0.3078353254210.1%
 
0.3124568393210.1%
 
0.329189907210.1%
 
Other values (5203)3108298.2%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.0025498007971< 0.1%
 
0.0026560424971< 0.1%
 
0.0046746347941< 0.1%
 
0.008233731742< 0.1%
 
0.010199203191< 0.1%
 
0.011845949541< 0.1%
 
0.015564409031< 0.1%
 
0.015670650733< 0.1%
 
0.015830013281< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.93306772911< 0.1%
 
0.87022576361< 0.1%
 
0.85003984061< 0.1%
 
0.81529880481< 0.1%
 
0.81423638781< 0.1%
 
0.81290836651< 0.1%
 
0.81232403721< 0.1%
 
0.80488711821< 0.1%
 
0.7971845951< 0.1%
 

half
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count5489
Unique (%)17.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.36841336781468886
Minimum0.0
Maximum1.0
Zeros1
Zeros (%)< 0.1%
Memory size247.4 KiB

Quantile statistics

Minimum0
5-th percentile0.2322140556
Q10.295610149
median0.3506343133
Q30.4261478051
95-th percentile0.5643198752
Maximum1
Range1
Interquartile range (IQR)0.1305376561

Descriptive statistics

Standard deviation0.1022286079
Coefficient of variation (CV)0.277483438
Kurtosis0.6026377855
Mean0.3684133678
Median Absolute Deviation (MAD)0.06322996375
Skewness0.7274735953
Sum11659.54626
Variance0.01045068828
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.3219391865270.1%
 
0.3321083367260.1%
 
0.3191703584240.1%
 
0.3285843737230.1%
 
0.3411699557230.1%
 
0.3254128071230.1%
 
0.3215364478220.1%
 
0.3304470399220.1%
 
0.308195731220.1%
 
0.3506343133210.1%
 
0.3211337092210.1%
 
0.3171566653210.1%
 
0.3271747886210.1%
 
0.3423278292210.1%
 
0.3372432541210.1%
 
0.3413209827200.1%
 
0.3376459928200.1%
 
0.3244563029200.1%
 
0.3921163915200.1%
 
0.3253121224200.1%
 
0.3166028997200.1%
 
0.3270741039200.1%
 
0.2547321788190.1%
 
0.3055779299190.1%
 
0.3079440193190.1%
 
Other values (5464)3111398.3%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.0028695126862< 0.1%
 
0.0054873137331< 0.1%
 
0.0085078534032< 0.1%
 
0.010924285141< 0.1%
 
0.012182843331< 0.1%
 
0.015857833272< 0.1%
 
0.015958517923< 0.1%
 
0.017569472411< 0.1%
 
0.018979057591< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.92650020141< 0.1%
 
0.87202980271< 0.1%
 
0.85174184451< 0.1%
 
0.82168747481< 0.1%
 
0.81906967381< 0.1%
 
0.81524365691< 0.1%
 
0.8075916231< 0.1%
 
0.80733991141< 0.1%
 
0.80255739031< 0.1%
 

25k
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count6556
Unique (%)20.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3609134973026622
Minimum0.0
Maximum1.0
Zeros1
Zeros (%)< 0.1%
Memory size247.4 KiB

Quantile statistics

Minimum0
5-th percentile0.2233368966
Q10.2862840227
median0.3414604062
Q30.4189622564
95-th percentile0.5666474796
Maximum1
Range1
Interquartile range (IQR)0.1326782337

Descriptive statistics

Standard deviation0.1051387952
Coefficient of variation (CV)0.2913130043
Kurtosis0.6657078364
Mean0.3609134973
Median Absolute Deviation (MAD)0.06405723214
Skewness0.7952161924
Sum11422.19036
Variance0.01105416625
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.3186415591250.1%
 
0.3468875915210.1%
 
0.2964394375210.1%
 
0.3176959132210.1%
 
0.3215196119210.1%
 
0.3583175726210.1%
 
0.2977551188200.1%
 
0.3101307458200.1%
 
0.30926733200.1%
 
0.3267412219190.1%
 
0.3120631527190.1%
 
0.3224241428190.1%
 
0.3054025162190.1%
 
0.3304826906180.1%
 
0.3297426199180.1%
 
0.2790477757180.1%
 
0.3253021955180.1%
 
0.3018666228170.1%
 
0.3163802319170.1%
 
0.313419949170.1%
 
0.3420771318170.1%
 
0.3001397911170.1%
 
0.3162568868170.1%
 
0.3042512951170.1%
 
0.3069237727170.1%
 
Other values (6531)3117498.5%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.0023435572732< 0.1%
 
0.006783981581< 0.1%
 
0.0076885124582< 0.1%
 
0.011306635971< 0.1%
 
0.012663432281< 0.1%
 
0.013773538364< 0.1%
 
0.013855768441< 0.1%
 
0.016240440751< 0.1%
 
0.018625113072< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.97052051641< 0.1%
 
0.90235178031< 0.1%
 
0.87690157061< 0.1%
 
0.84195378671< 0.1%
 
0.82299975331< 0.1%
 
0.81395444451< 0.1%
 
0.81091193161< 0.1%
 
0.80490913581< 0.1%
 
0.80194885291< 0.1%
 

30k
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count7926
Unique (%)25.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.33583957374459794
Minimum0.0
Maximum1.0
Zeros1
Zeros (%)< 0.1%
Memory size247.4 KiB

Quantile statistics

Minimum0
5-th percentile0.2029303285
Q10.2627606039
median0.3158503423
Q30.3930547307
95-th percentile0.5379708061
Maximum1
Range1
Interquartile range (IQR)0.1302941268

Descriptive statistics

Standard deviation0.1028032319
Coefficient of variation (CV)0.3061081539
Kurtosis0.6447323519
Mean0.3358395737
Median Absolute Deviation (MAD)0.06226362017
Skewness0.8279680976
Sum10628.65083
Variance0.0105685045
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.280905198190.1%
 
0.2994405026180.1%
 
0.2796549245170.1%
 
0.343200075170.1%
 
0.3085674991170.1%
 
0.3118494671160.1%
 
0.3042853124160.1%
 
0.2863126309160.1%
 
0.2507736067160.1%
 
0.2835620292160.1%
 
0.3167255337160.1%
 
0.339668052415< 0.1%
 
0.283655799715< 0.1%
 
0.302316131715< 0.1%
 
0.327665426815< 0.1%
 
0.308254930815< 0.1%
 
0.223236332915< 0.1%
 
0.295064545415< 0.1%
 
0.272809677115< 0.1%
 
0.22092332715< 0.1%
 
0.337573844315< 0.1%
 
0.291720063814< 0.1%
 
0.326321382814< 0.1%
 
0.297940174414< 0.1%
 
0.282811865114< 0.1%
 
Other values (7901)3125898.8%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.0018128965712< 0.1%
 
0.0067202200481< 0.1%
 
0.0071265589351< 0.1%
 
0.0085331166191< 0.1%
 
0.010377270031< 0.1%
 
0.010846122592< 0.1%
 
0.011627543531< 0.1%
 
0.013659237961< 0.1%
 
0.013753008471< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.97130622321< 0.1%
 
0.83674553811< 0.1%
 
0.81352170791< 0.1%
 
0.79120432591< 0.1%
 
0.79117306911< 0.1%
 
0.77516956831< 0.1%
 
0.76354202481< 0.1%
 
0.76026005691< 0.1%
 
0.75975994751< 0.1%
 

35k
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count9342
Unique (%)29.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3367644873239581
Minimum0.0
Maximum1.0
Zeros1
Zeros (%)< 0.1%
Memory size247.4 KiB

Quantile statistics

Minimum0
5-th percentile0.1990783169
Q10.2606111387
median0.3153360739
Q30.3978686078
95-th percentile0.5464965568
Maximum1
Range1
Interquartile range (IQR)0.137257469

Descriptive statistics

Standard deviation0.1067676473
Coefficient of variation (CV)0.3170395079
Kurtosis0.5190601842
Mean0.3367644873
Median Absolute Deviation (MAD)0.06504150192
Skewness0.8164986246
Sum10657.92249
Variance0.01139933052
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.2853551884170.1%
 
0.3112251578170.1%
 
0.2770286193160.1%
 
0.2647220549160.1%
 
0.295619386815< 0.1%
 
0.295226624115< 0.1%
 
0.301537011315< 0.1%
 
0.270822968815< 0.1%
 
0.262496399714< 0.1%
 
0.289806498914< 0.1%
 
0.260427849514< 0.1%
 
0.294257809414< 0.1%
 
0.270168364314< 0.1%
 
0.267602314714< 0.1%
 
0.281244272213< 0.1%
 
0.268283103313< 0.1%
 
0.317221334913< 0.1%
 
0.288235448113< 0.1%
 
0.301510827213< 0.1%
 
0.267366657113< 0.1%
 
0.293550836613< 0.1%
 
0.274226912113< 0.1%
 
0.294519651213< 0.1%
 
0.288837684313< 0.1%
 
0.302243984213< 0.1%
 
Other values (9317)3129598.9%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.001073551362< 0.1%
 
0.0073577544451< 0.1%
 
0.0083265690871< 0.1%
 
0.0089288052162< 0.1%
 
0.0098452514991< 0.1%
 
0.010421303451< 0.1%
 
0.012463669451< 0.1%
 
0.013563404991< 0.1%
 
0.013615773351< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.97509884531< 0.1%
 
0.84883873161< 0.1%
 
0.81998376581< 0.1%
 
0.79935063231< 0.1%
 
0.78081223321< 0.1%
 
0.77518263471< 0.1%
 
0.77046948231< 0.1%
 
0.7696839571< 0.1%
 
0.76486606791< 0.1%
 

40k
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count10391
Unique (%)32.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3518576171314184
Minimum0.0
Maximum1.0
Zeros1
Zeros (%)< 0.1%
Memory size247.4 KiB

Quantile statistics

Minimum0
5-th percentile0.206794679
Q10.2718796992
median0.3293811452
Q30.4175130133
95-th percentile0.568189705
Maximum1
Range1
Interquartile range (IQR)0.1456333141

Descriptive statistics

Standard deviation0.1112207779
Coefficient of variation (CV)0.3160959789
Kurtosis0.3720835143
Mean0.3518576171
Median Absolute Deviation (MAD)0.06860613071
Skewness0.7709058553
Sum11135.58987
Variance0.01237006144
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.2995951417160.1%
 
0.29050318115< 0.1%
 
0.305772122615< 0.1%
 
0.30024291514< 0.1%
 
0.32858299613< 0.1%
 
0.289693464413< 0.1%
 
0.3026489313< 0.1%
 
0.297790630413< 0.1%
 
0.286338924213< 0.1%
 
0.330595720112< 0.1%
 
0.317640254512< 0.1%
 
0.28761133612< 0.1%
 
0.283979178712< 0.1%
 
0.29330248712< 0.1%
 
0.29802197812< 0.1%
 
0.325320994811< 0.1%
 
0.30371312911< 0.1%
 
0.340589936411< 0.1%
 
0.302995951411< 0.1%
 
0.263921341811< 0.1%
 
0.276275303611< 0.1%
 
0.32057836911< 0.1%
 
0.298392134211< 0.1%
 
0.302486986711< 0.1%
 
0.27720069411< 0.1%
 
Other values (10366)3134199.0%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.0013880855991< 0.1%
 
0.0014343551191< 0.1%
 
0.0086292654711< 0.1%
 
0.0088374783111< 0.1%
 
0.0094389820712< 0.1%
 
0.010017351071< 0.1%
 
0.010456911511< 0.1%
 
0.013256217471< 0.1%
 
0.014042799312< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.99958357431< 0.1%
 
0.87882012721< 0.1%
 
0.83393869291< 0.1%
 
0.82632735691< 0.1%
 
0.79914401391< 0.1%
 
0.79086176981< 0.1%
 
0.7878542511< 0.1%
 
0.77899363791< 0.1%
 
0.77563909771< 0.1%
 

official
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count10848
Unique (%)34.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.35208361312506
Minimum0.0
Maximum1.0000000000000002
Zeros1
Zeros (%)< 0.1%
Memory size247.4 KiB

Quantile statistics

Minimum0
5-th percentile0.206598586
Q10.2721043903
median0.3303220738
Q30.4180850135
95-th percentile0.5671489046
Maximum1
Range1
Interquartile range (IQR)0.1459806232

Descriptive statistics

Standard deviation0.1109437003
Coefficient of variation (CV)0.3151061173
Kurtosis0.3291582404
Mean0.3520836131
Median Absolute Deviation (MAD)0.06914986471
Skewness0.7517530927
Sum11142.74219
Variance0.01230850463
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.257375403714< 0.1%
 
0.314218381814< 0.1%
 
0.278803351714< 0.1%
 
0.340119577613< 0.1%
 
0.27009688413< 0.1%
 
0.311643536713< 0.1%
 
0.318429780912< 0.1%
 
0.274766518312< 0.1%
 
0.290477437412< 0.1%
 
0.299925809512< 0.1%
 
0.28792441312< 0.1%
 
0.333704285612< 0.1%
 
0.335231736111< 0.1%
 
0.32436501711< 0.1%
 
0.313563760111< 0.1%
 
0.334227982911< 0.1%
 
0.289604608511< 0.1%
 
0.33843938211< 0.1%
 
0.316727764711< 0.1%
 
0.262328707311< 0.1%
 
0.290281050911< 0.1%
 
0.27385004811< 0.1%
 
0.2781487311< 0.1%
 
0.307017543911< 0.1%
 
0.308981408711< 0.1%
 
Other values (10823)3135299.1%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.001374705422< 0.1%
 
0.0088373919871< 0.1%
 
0.0089464955921< 0.1%
 
0.0092083442441< 0.1%
 
0.0093392685691< 0.1%
 
0.010146635241< 0.1%
 
0.010430304621< 0.1%
 
0.01374705421< 0.1%
 
0.013856157812< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.9898315441< 0.1%
 
0.87108318061< 0.1%
 
0.83195862791< 0.1%
 
0.82253207651< 0.1%
 
0.79383782841< 0.1%
 
0.78351662741< 0.1%
 
0.78004713281< 0.1%
 
0.77731954261< 0.1%
 
0.77502836691< 0.1%
 

pace
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count702
Unique (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3520001462629808
Minimum0.0
Maximum0.9999999999999998
Zeros1
Zeros (%)< 0.1%
Memory size247.4 KiB

Quantile statistics

Minimum0
5-th percentile0.2069754145
Q10.2715837621
median0.3299028016
Q30.4173813608
95-th percentile0.5671812464
Maximum1
Range1
Interquartile range (IQR)0.1457975986

Descriptive statistics

Standard deviation0.1108735157
Coefficient of variation (CV)0.3149814478
Kurtosis0.3273589478
Mean0.3520001463
Median Absolute Deviation (MAD)0.06861063465
Skewness0.7513709293
Sum11140.10063
Variance0.01229293648
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.3030303031620.5%
 
0.28130360211500.5%
 
0.29731275011500.5%
 
0.28587764441490.5%
 
0.28016009151490.5%
 
0.29959977131490.5%
 
0.33390508861480.5%
 
0.31389365351470.5%
 
0.27101200691460.5%
 
0.32532875931450.5%
 
0.30131503721440.5%
 
0.2910234421430.5%
 
0.30245854771430.5%
 
0.31160663241410.4%
 
0.33847913091400.4%
 
0.27844482561400.4%
 
0.29845626071400.4%
 
0.29674099491400.4%
 
0.28244711261400.4%
 
0.30817610061390.4%
 
0.2773013151380.4%
 
0.33676386511380.4%
 
0.29388221841370.4%
 
0.29559748431360.4%
 
0.30588907951360.4%
 
Other values (677)2805888.7%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.0011435105772< 0.1%
 
0.0085763293312< 0.1%
 
0.0097198399092< 0.1%
 
0.010863350492< 0.1%
 
0.013722126933< 0.1%
 
0.01543739281< 0.1%
 
0.019439679822< 0.1%
 
0.026872498571< 0.1%
 
0.031446540881< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.98913664951< 0.1%
 
0.87021154951< 0.1%
 
0.83133218981< 0.1%
 
0.82275586051< 0.1%
 
0.79416809611< 0.1%
 
0.78330474561< 0.1%
 
0.78273299031< 0.1%
 
0.77987421381< 0.1%
 
0.77415666091< 0.1%
 

overall
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count31595
Unique (%)99.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15834.008657735087
Minimum1
Maximum31931
Zeros0
Zeros (%)0.0%
Memory size247.4 KiB

Quantile statistics

Minimum1
5-th percentile1530.35
Q17879.75
median15829.5
Q323785.25
95-th percentile30149.65
Maximum31931
Range31930
Interquartile range (IQR)15905.5

Descriptive statistics

Standard deviation9182.364552
Coefficient of variation (CV)0.5799140793
Kurtosis-1.199877423
Mean15834.00866
Median Absolute Deviation (MAD)7953
Skewness0.001798552736
Sum501114706
Variance84315818.76
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
42< 0.1%
 
492< 0.1%
 
62< 0.1%
 
342< 0.1%
 
102< 0.1%
 
22< 0.1%
 
382< 0.1%
 
222< 0.1%
 
142< 0.1%
 
302< 0.1%
 
462< 0.1%
 
252< 0.1%
 
472< 0.1%
 
172< 0.1%
 
412< 0.1%
 
332< 0.1%
 
72< 0.1%
 
12< 0.1%
 
92< 0.1%
 
392< 0.1%
 
312< 0.1%
 
232< 0.1%
 
182< 0.1%
 
422< 0.1%
 
502< 0.1%
 
Other values (31570)3159899.8%
 
ValueCountFrequency (%) 
12< 0.1%
 
22< 0.1%
 
32< 0.1%
 
42< 0.1%
 
52< 0.1%
 
62< 0.1%
 
72< 0.1%
 
82< 0.1%
 
92< 0.1%
 
102< 0.1%
 
ValueCountFrequency (%) 
319311< 0.1%
 
319301< 0.1%
 
319291< 0.1%
 
319261< 0.1%
 
319251< 0.1%
 
319171< 0.1%
 
319091< 0.1%
 
319071< 0.1%
 
319061< 0.1%
 
319031< 0.1%
 

age
Real number (ℝ≥0)

Distinct count64
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.38705873738103264
Minimum0.0
Maximum0.9999999999999999
Zeros33
Zeros (%)0.1%
Memory size247.4 KiB

Quantile statistics

Minimum0
5-th percentile0.1111111111
Q10.2380952381
median0.380952381
Q30.5079365079
95-th percentile0.6825396825
Maximum1
Range1
Interquartile range (IQR)0.2698412698

Descriptive statistics

Standard deviation0.1793616581
Coefficient of variation (CV)0.4633964842
Kurtosis-0.56427208
Mean0.3870587374
Median Absolute Deviation (MAD)0.126984127
Skewness0.1670120483
Sum12249.63492
Variance0.03217060439
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.428571428611913.8%
 
0.444444444411493.6%
 
0.349206349210753.4%
 
0.460317460310533.3%
 
0.38095238110343.3%
 
0.507936507910043.2%
 
0.36507936519763.1%
 
0.39682539689613.0%
 
0.47619047629042.9%
 
0.52380952388972.8%
 
0.49206349218782.8%
 
0.41269841278762.8%
 
0.33333333338732.8%
 
0.28571428578732.8%
 
0.26984126988352.6%
 
0.30158730168052.5%
 
0.53968253977982.5%
 
0.31746031757922.5%
 
0.22222222227802.5%
 
0.15873015877592.4%
 
0.19047619057562.4%
 
0.20634920637492.4%
 
0.14285714297332.3%
 
0.17460317467272.3%
 
0.23809523817122.2%
 
Other values (39)945829.9%
 
ValueCountFrequency (%) 
0330.1%
 
0.01587301587410.1%
 
0.031746031751110.4%
 
0.047619047621710.5%
 
0.063492063492750.9%
 
0.079365079373461.1%
 
0.095238095244791.5%
 
0.11111111115551.8%
 
0.1269841277022.2%
 
0.14285714297332.3%
 
ValueCountFrequency (%) 
15< 0.1%
 
0.98412698413< 0.1%
 
0.96825396833< 0.1%
 
0.95238095245< 0.1%
 
0.93650793656< 0.1%
 
0.920634920613< 0.1%
 
0.9047619048170.1%
 
0.888888888913< 0.1%
 
0.873015873220.1%
 
0.8571428571310.1%
 

gender
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size247.4 KiB
M
17484
F
14164
ValueCountFrequency (%) 
M1748455.2%
 
F1416444.8%
 

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters2
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
M1748455.2%
 
F1416444.8%
 

Most occurring categories

ValueCountFrequency (%) 
Uppercase Letter31648100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
M1748455.2%
 
F1416444.8%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin31648100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
M1748455.2%
 
F1416444.8%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII31648100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
M1748455.2%
 
F1416444.8%
 

genderdiv
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count17490
Unique (%)55.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8003.025404448938
Minimum1
Maximum17575
Zeros0
Zeros (%)0.0%
Memory size247.4 KiB

Quantile statistics

Minimum1
5-th percentile767.35
Q13940
median7919
Q311893
95-th percentile15914.65
Maximum17575
Range17574
Interquartile range (IQR)7953

Descriptive statistics

Standard deviation4733.295012
Coefficient of variation (CV)0.591438209
Kurtosis-1.072844482
Mean8003.025404
Median Absolute Deviation (MAD)3976.5
Skewness0.09915551468
Sum253279748
Variance22404081.67
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
54< 0.1%
 
44< 0.1%
 
24< 0.1%
 
74< 0.1%
 
34< 0.1%
 
84< 0.1%
 
94< 0.1%
 
104< 0.1%
 
64< 0.1%
 
114< 0.1%
 
14< 0.1%
 
313< 0.1%
 
233< 0.1%
 
353< 0.1%
 
283< 0.1%
 
293< 0.1%
 
263< 0.1%
 
193< 0.1%
 
413< 0.1%
 
393< 0.1%
 
333< 0.1%
 
143< 0.1%
 
173< 0.1%
 
383< 0.1%
 
303< 0.1%
 
Other values (17465)3156299.7%
 
ValueCountFrequency (%) 
14< 0.1%
 
24< 0.1%
 
34< 0.1%
 
44< 0.1%
 
54< 0.1%
 
64< 0.1%
 
74< 0.1%
 
84< 0.1%
 
94< 0.1%
 
104< 0.1%
 
ValueCountFrequency (%) 
175751< 0.1%
 
175741< 0.1%
 
175731< 0.1%
 
175711< 0.1%
 
175621< 0.1%
 
175611< 0.1%
 
175581< 0.1%
 
175551< 0.1%
 
175541< 0.1%
 
175531< 0.1%
 

division
Real number (ℝ≥0)

Distinct count6921
Unique (%)21.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1922.8003665318504
Minimum1
Maximum6979
Zeros0
Zeros (%)0.0%
Memory size247.4 KiB

Quantile statistics

Minimum1
5-th percentile98
Q1607
median1418
Q32601
95-th percentile5613.65
Maximum6979
Range6978
Interquartile range (IQR)1994

Descriptive statistics

Standard deviation1704.844465
Coefficient of variation (CV)0.8866466301
Kurtosis0.3354066435
Mean1922.800367
Median Absolute Deviation (MAD)917
Skewness1.113716829
Sum60852786
Variance2906494.651
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1230.1%
 
3220.1%
 
2220.1%
 
8200.1%
 
5200.1%
 
7200.1%
 
4200.1%
 
9190.1%
 
11190.1%
 
10190.1%
 
6190.1%
 
19180.1%
 
14180.1%
 
15180.1%
 
18180.1%
 
13180.1%
 
17180.1%
 
23170.1%
 
24170.1%
 
31170.1%
 
39170.1%
 
12170.1%
 
38170.1%
 
22170.1%
 
27170.1%
 
Other values (6896)3118198.5%
 
ValueCountFrequency (%) 
1230.1%
 
2220.1%
 
3220.1%
 
4200.1%
 
5200.1%
 
6190.1%
 
7200.1%
 
8200.1%
 
9190.1%
 
10190.1%
 
ValueCountFrequency (%) 
69791< 0.1%
 
69771< 0.1%
 
69691< 0.1%
 
69681< 0.1%
 
69621< 0.1%
 
69601< 0.1%
 
69531< 0.1%
 
69521< 0.1%
 
69511< 0.1%
 
69501< 0.1%
 

country
Categorical

HIGH CARDINALITY

Distinct count78
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size247.4 KiB
USA
26939
CAN
 
2164
GBR
 
341
ITA
 
209
MEX
 
202
Other values (73)
 
1793
ValueCountFrequency (%) 
USA2693985.1%
 
CAN21646.8%
 
GBR3411.1%
 
ITA2090.7%
 
MEX2020.6%
 
GER1800.6%
 
JPN1720.5%
 
AUS1230.4%
 
IRL1160.4%
 
FRA1130.4%
 
ESP870.3%
 
BRA850.3%
 
SWE660.2%
 
SUI620.2%
 
KOR600.2%
 
NED540.2%
 
HKG530.2%
 
DEN490.2%
 
CHN420.1%
 
COL400.1%
 
CHI290.1%
 
FIN280.1%
 
ISL270.1%
 
BEL270.1%
 
SIN230.1%
 
Other values (53)3571.1%
 

Length

Max length3
Median length3
Mean length3
Min length3

Overview of Unicode Properties

Unique unicode characters26
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
A2973931.3%
 
S2739028.8%
 
U2721628.7%
 
N26162.8%
 
C23162.4%
 
R10481.1%
 
E7820.8%
 
G6150.6%
 
I5150.5%
 
B4770.5%
 
P3150.3%
 
T2650.3%
 
L2580.3%
 
M2100.2%
 
X2060.2%
 
J1720.2%
 
O1610.2%
 
H1410.1%
 
F1410.1%
 
K1320.1%
 
D1120.1%
 
W660.1%
 
V24< 0.1%
 
Z22< 0.1%
 
Y3< 0.1%
 

Most occurring categories

ValueCountFrequency (%) 
Uppercase Letter94944100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
A2973931.3%
 
S2739028.8%
 
U2721628.7%
 
N26162.8%
 
C23162.4%
 
R10481.1%
 
E7820.8%
 
G6150.6%
 
I5150.5%
 
B4770.5%
 
P3150.3%
 
T2650.3%
 
L2580.3%
 
M2100.2%
 
X2060.2%
 
J1720.2%
 
O1610.2%
 
H1410.1%
 
F1410.1%
 
K1320.1%
 
D1120.1%
 
W660.1%
 
V24< 0.1%
 
Z22< 0.1%
 
Y3< 0.1%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin94944100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
A2973931.3%
 
S2739028.8%
 
U2721628.7%
 
N26162.8%
 
C23162.4%
 
R10481.1%
 
E7820.8%
 
G6150.6%
 
I5150.5%
 
B4770.5%
 
P3150.3%
 
T2650.3%
 
L2580.3%
 
M2100.2%
 
X2060.2%
 
J1720.2%
 
O1610.2%
 
H1410.1%
 
F1410.1%
 
K1320.1%
 
D1120.1%
 
W660.1%
 
V24< 0.1%
 
Z22< 0.1%
 
Y3< 0.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII94944100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
A2973931.3%
 
S2739028.8%
 
U2721628.7%
 
N26162.8%
 
C23162.4%
 
R10481.1%
 
E7820.8%
 
G6150.6%
 
I5150.5%
 
B4770.5%
 
P3150.3%
 
T2650.3%
 
L2580.3%
 
M2100.2%
 
X2060.2%
 
J1720.2%
 
O1610.2%
 
H1410.1%
 
F1410.1%
 
K1320.1%
 
D1120.1%
 
W660.1%
 
V24< 0.1%
 
Z22< 0.1%
 
Y3< 0.1%
 

state
Categorical

HIGH CARDINALITY

Distinct count69
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size247.4 KiB
MA
7427
others
 
2545
CA
 
2302
NY
 
1537
ON
 
1045
Other values (64)
16792
ValueCountFrequency (%) 
MA742723.5%
 
others25458.0%
 
CA23027.3%
 
NY15374.9%
 
ON10453.3%
 
PA9973.2%
 
TX9883.1%
 
IL9112.9%
 
OH7542.4%
 
FL7452.4%
 
VA7092.2%
 
MI6892.2%
 
NC6171.9%
 
WA5951.9%
 
CO5761.8%
 
NJ5641.8%
 
MN5541.8%
 
MD5221.6%
 
NH5211.6%
 
CT5201.6%
 
GA4411.4%
 
UT4291.4%
 
WI4291.4%
 
QC4181.3%
 
OR4051.3%
 
Other values (44)440813.9%
 

Length

Max length6
Median length2
Mean length2.321663296
Min length2

Overview of Unicode Properties

Unique unicode characters32
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
A1343318.3%
 
M992113.5%
 
N58658.0%
 
C50236.8%
 
O31254.3%
 
I28743.9%
 
o25453.5%
 
t25453.5%
 
h25453.5%
 
e25453.5%
 
r25453.5%
 
s25453.5%
 
T24063.3%
 
L18542.5%
 
Y17082.3%
 
H13231.8%
 
W11021.5%
 
P10381.4%
 
V9931.4%
 
X9881.3%
 
D9741.3%
 
F7451.0%
 
R7021.0%
 
J5640.8%
 
S5390.7%
 
Other values (7)30294.1%
 

Most occurring categories

ValueCountFrequency (%) 
Uppercase Letter5820679.2%
 
Lowercase Letter1527020.8%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
o254516.7%
 
t254516.7%
 
h254516.7%
 
e254516.7%
 
r254516.7%
 
s254516.7%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
A1343323.1%
 
M992117.0%
 
N586510.1%
 
C50238.6%
 
O31255.4%
 
I28744.9%
 
T24064.1%
 
L18543.2%
 
Y17082.9%
 
H13232.3%
 
W11021.9%
 
P10381.8%
 
V9931.7%
 
X9881.7%
 
D9741.7%
 
F7451.3%
 
R7021.2%
 
J5641.0%
 
S5390.9%
 
B5130.9%
 
K4580.8%
 
E4520.8%
 
G4420.8%
 
U4300.7%
 
Q4180.7%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin73476100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
A1343318.3%
 
M992113.5%
 
N58658.0%
 
C50236.8%
 
O31254.3%
 
I28743.9%
 
o25453.5%
 
t25453.5%
 
h25453.5%
 
e25453.5%
 
r25453.5%
 
s25453.5%
 
T24063.3%
 
L18542.5%
 
Y17082.3%
 
H13231.8%
 
W11021.5%
 
P10381.4%
 
V9931.4%
 
X9881.3%
 
D9741.3%
 
F7451.0%
 
R7021.0%
 
J5640.8%
 
S5390.7%
 
Other values (7)30294.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII73476100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
A1343318.3%
 
M992113.5%
 
N58658.0%
 
C50236.8%
 
O31254.3%
 
I28743.9%
 
o25453.5%
 
t25453.5%
 
h25453.5%
 
e25453.5%
 
r25453.5%
 
s25453.5%
 
T24063.3%
 
L18542.5%
 
Y17082.3%
 
H13231.8%
 
W11021.5%
 
P10381.4%
 
V9931.4%
 
X9881.3%
 
D9741.3%
 
F7451.0%
 
R7021.0%
 
J5640.8%
 
S5390.7%
 
Other values (7)30294.1%
 

city
Categorical

HIGH CARDINALITY

Distinct count5905
Unique (%)18.7%
Missing0
Missing (%)0.0%
Memory size247.4 KiB
Boston
 
1018
New York
 
497
Chicago
 
312
Cambridge
 
306
Toronto
 
239
Other values (5900)
29276
ValueCountFrequency (%) 
Boston10183.2%
 
New York4971.6%
 
Chicago3121.0%
 
Cambridge3061.0%
 
Toronto2390.8%
 
Somerville2390.8%
 
Brookline2190.7%
 
Washington2100.7%
 
Newton1950.6%
 
San Francisco1920.6%
 
Austin1910.6%
 
Portland1910.6%
 
Arlington1840.6%
 
San Diego1800.6%
 
Seattle1490.5%
 
Charlestown1480.5%
 
Brighton1480.5%
 
Houston1380.4%
 
London1340.4%
 
Medford1320.4%
 
Brooklyn1310.4%
 
Ottawa1300.4%
 
Quincy1300.4%
 
South Boston1290.4%
 
Atlanta1210.4%
 
Other values (5880)2598582.1%
 

Length

Max length35
Median length8
Mean length8.799892568
Min length2

Overview of Unicode Properties

Unique unicode characters75
Unique unicode categories (?)8
Unique unicode scripts (?)2
Unique unicode blocks (?)2
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
o246698.9%
 
e243318.7%
 
a226498.1%
 
n215457.7%
 
r172536.2%
 
l168376.0%
 
t163465.9%
 
i161945.8%
 
s115494.1%
 
79102.8%
 
d73552.6%
 
h72792.6%
 
u63022.3%
 
g54422.0%
 
c49881.8%
 
m45581.6%
 
S42551.5%
 
B41971.5%
 
C39001.4%
 
k38631.4%
 
y36871.3%
 
w33411.2%
 
v31701.1%
 
b30831.1%
 
M28101.0%
 
Other values (50)3098611.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter22998082.6%
 
Uppercase Letter3978714.3%
 
Space Separator79102.8%
 
Other Punctuation4600.2%
 
Dash Punctuation2670.1%
 
Open Punctuation37< 0.1%
 
Close Punctuation37< 0.1%
 
Decimal Number21< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
S425510.7%
 
B419710.5%
 
C39009.8%
 
M28107.1%
 
W24886.3%
 
P22645.7%
 
A22265.6%
 
N21805.5%
 
L21725.5%
 
H19955.0%
 
R15353.9%
 
D15233.8%
 
F12803.2%
 
G11973.0%
 
T10942.7%
 
O8632.2%
 
E7641.9%
 
V6381.6%
 
Y6371.6%
 
K5461.4%
 
J5051.3%
 
I3420.9%
 
Q2120.5%
 
U1260.3%
 
Z350.1%
 
Other values (2)3< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
o2466910.7%
 
e2433110.6%
 
a226499.8%
 
n215459.4%
 
r172537.5%
 
l168377.3%
 
t163467.1%
 
i161947.0%
 
s115495.0%
 
d73553.2%
 
h72793.2%
 
u63022.7%
 
g54422.4%
 
c49882.2%
 
m45582.0%
 
k38631.7%
 
y36871.6%
 
w33411.5%
 
v31701.4%
 
b30831.3%
 
p24051.0%
 
f17530.8%
 
x7940.3%
 
z2610.1%
 
q1850.1%
 
Other values (7)1410.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
7910100.0%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-267100.0%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(37100.0%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)37100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.28562.0%
 
,12226.5%
 
'4810.4%
 
/51.1%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
1523.8%
 
0419.0%
 
2314.3%
 
8314.3%
 
629.5%
 
329.5%
 
914.8%
 
414.8%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin26976796.9%
 
Common87323.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
o246699.1%
 
e243319.0%
 
a226498.4%
 
n215458.0%
 
r172536.4%
 
l168376.2%
 
t163466.1%
 
i161946.0%
 
s115494.3%
 
d73552.7%
 
h72792.7%
 
u63022.3%
 
g54422.0%
 
c49881.8%
 
m45581.7%
 
S42551.6%
 
B41971.6%
 
C39001.4%
 
k38631.4%
 
y36871.4%
 
w33411.2%
 
v31701.2%
 
b30831.1%
 
M28101.0%
 
W24880.9%
 
Other values (34)2767610.3%
 

Most frequent Common characters

ValueCountFrequency (%) 
791090.6%
 
.2853.3%
 
-2673.1%
 
,1221.4%
 
'480.5%
 
(370.4%
 
)370.4%
 
/50.1%
 
150.1%
 
04< 0.1%
 
23< 0.1%
 
83< 0.1%
 
62< 0.1%
 
32< 0.1%
 
91< 0.1%
 
41< 0.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII278488> 99.9%
 
None11< 0.1%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
o246698.9%
 
e243318.7%
 
a226498.1%
 
n215457.7%
 
r172536.2%
 
l168376.0%
 
t163465.9%
 
i161945.8%
 
s115494.1%
 
79102.8%
 
d73552.6%
 
h72792.6%
 
u63022.3%
 
g54422.0%
 
c49881.8%
 
m45581.6%
 
S42551.5%
 
B41971.5%
 
C39001.4%
 
k38631.4%
 
y36871.3%
 
w33411.2%
 
v31701.1%
 
b30831.1%
 
M28101.0%
 
Other values (43)3097511.1%
 

Most frequent None characters

ValueCountFrequency (%) 
ö218.2%
 
á218.2%
 
í218.2%
 
ã218.2%
 
â19.1%
 
Ø19.1%
 
ø19.1%
 

bib
Categorical

UNIQUE

Distinct count31648
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size247.4 KiB
10751
 
1
35698
 
1
23242
 
1
29215
 
1
14477
 
1
Other values (31643)
31643
ValueCountFrequency (%) 
107511< 0.1%
 
356981< 0.1%
 
232421< 0.1%
 
292151< 0.1%
 
144771< 0.1%
 
295011< 0.1%
 
52421< 0.1%
 
247741< 0.1%
 
64191< 0.1%
 
274671< 0.1%
 
336711< 0.1%
 
167891< 0.1%
 
336701< 0.1%
 
54871< 0.1%
 
229081< 0.1%
 
343241< 0.1%
 
109401< 0.1%
 
206271< 0.1%
 
307401< 0.1%
 
77761< 0.1%
 
319161< 0.1%
 
11331< 0.1%
 
95161< 0.1%
 
125981< 0.1%
 
324841< 0.1%
 
Other values (31623)3162399.9%
 

Length

Max length5
Median length5
Mean length4.693914307
Min length1

Overview of Unicode Properties

Unique unicode characters12
Unique unicode categories (?)2
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
22205514.8%
 
12171314.6%
 
31828912.3%
 
4130148.8%
 
5129118.7%
 
7121978.2%
 
6121568.2%
 
0121338.2%
 
8120958.1%
 
9118898.0%
 
W53< 0.1%
 
F48< 0.1%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number14845299.9%
 
Uppercase Letter1010.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
W5352.5%
 
F4847.5%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
22205514.9%
 
12171314.6%
 
31828912.3%
 
4130148.8%
 
5129118.7%
 
7121978.2%
 
6121568.2%
 
0121338.2%
 
8120958.1%
 
9118898.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common14845299.9%
 
Latin1010.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
W5352.5%
 
F4847.5%
 

Most frequent Common characters

ValueCountFrequency (%) 
22205514.9%
 
12171314.6%
 
31828912.3%
 
4130148.8%
 
5129118.7%
 
7121978.2%
 
6121568.2%
 
0121338.2%
 
8120958.1%
 
9118898.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII148553100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
22205514.8%
 
12171314.6%
 
31828912.3%
 
4130148.8%
 
5129118.7%
 
7121978.2%
 
6121568.2%
 
0121338.2%
 
8120958.1%
 
9118898.0%
 
W53< 0.1%
 
F48< 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

5k10k20khalf25k30k35k40kofficialpaceoverallagegendergenderdivdivisioncountrystatecitybib
00.0034090.0078320.0082340.0085080.0076890.0085330.0104210.0100170.0101470.01086380.460317M88JPNothersFukuokaW1
10.1069440.1666670.1579280.1582760.1506040.1340290.1281720.1295550.1273240.126930210.238095F11KENothersEldoretF1
20.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.00000010.365079M11RSAothersPaarlW2
30.1066920.1665620.1579280.1582760.1506040.1340290.1281720.1315210.1308810.130932270.095238F33ETHothersShoaF2
40.0034090.0052210.0025500.0028700.0023440.0018130.0010740.0014340.0013750.00114420.349206M22JPNothersNogata FukuokaW3
50.1069440.1666670.1579280.1582760.1506870.1340290.1281720.1315210.1310990.130932280.174603F44KENothersNandiF3
60.0079550.0107560.0082340.0085080.0076890.0067200.0073580.0086290.0088370.00857640.158730M44SUIothersNeuenkirchW4
70.0936870.1447370.1355640.1350180.1280730.1127430.1048680.1080390.1074230.10806250.174603M55ETHothersAddis Ababa5
80.0031570.0052210.0026560.0028700.0023440.0018130.0010740.0013880.0013750.00114430.396825M33JPNothersIsahayaW6
90.0934340.1447370.1364140.1369310.1310750.1198070.1174360.1243030.1248800.125214200.206349M2020USACARedding6

Last rows

5k10k20khalf25k30k35k40kofficialpaceoverallagegendergenderdivdivisioncountrystatecitybib
316380.3162880.5384290.5833730.5863370.5860130.5403680.5333190.5525270.5482020.548313296380.206349M165415604USAMADorchester35901
316390.3329550.5525270.5766270.5788860.5951400.5664680.5783560.6027070.6013790.601487308270.603175M170921725USAMAReading35902
316400.3011360.4942560.4904650.4906870.4767290.4500980.4262260.4601970.4544170.454545259750.238095M147304993USAMAHyde Park35905
316410.2786620.4877820.5011950.4981370.4917360.4753540.4692990.4870330.4851400.485420274900.301587M154725217USAMABoston35906
316420.3705810.6279240.6782470.6817360.6753970.6287310.6235760.6387040.6407000.640366312960.412698M172882598USAMAWayland35907
316430.2320710.3560990.3370520.3360350.3216020.2881880.2814280.2871490.2861790.28587898490.222222M75003414USACALarkspur35908
316440.2944440.4663740.4901460.4924490.4901740.4594130.4641930.4843720.4842450.484277274560.253968M154595212USAMANorwell35909
316450.2579550.4425650.4633730.4658680.4572810.4257180.4247070.4400230.4394690.439680250920.047619F107885231USACTWest Simsbury35910
316460.2933080.4921680.4982740.4983890.5016860.4721970.4706870.4852980.4847040.484277274760.317460F120095800USAMANorth Andover35911
316470.2420450.3864870.3820980.3831050.3668280.3293850.3201280.3247660.3222920.322470148440.571429M97711396USAPALancaster35912